Eleftherios Avramidis and Jonas Kuhn: Exploiting XLE's finite state interface in LFG-based statistical machine translation

نویسندگان

  • Eleftherios Avramidis
  • Jonas Kuhn
  • Miriam Butt
  • Tracy Holloway
چکیده

We present the addition of a morphological generation component to an LFG-based Statistical Machine Translation System, taking advantage of existing morphological grammars and the FST (Finite State Transducer) processing pipeline of the XLE system. The extended syntax-driven translation system takes separate stochastic decisions for lemmata and morphological tags; the role of finite-state morphological grammars is to generate full forms out of a bundle of morphological tags produced by the translation component. This technique can lead to a more effective use of a given amount of training data from a parallel corpus, since lexical vs. morphosyntactic translation patterns can be induced independently. The existing FST processing cascade for German, when added to the Statistical Machine Translation System, suffers from generation failures. These occur due to overgeneralisation by the syntax-driven translation process and originate from (i) the use of various underspecification tags in the morphological grammar, or (ii) erroneous assignment of certain tags to a given lemma. In order to deal with this, we add a set of replacement/correction rules on top of the cascade. The augmented FST cascade leads to an increase of generation coverage from 47.90% to 75.35%. A detailed error analysis for the remaining 24.65% is given.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Deep Grammars in a Tree Labeling Approach to Syntax-based Statistical Machine Translation

In this paper, we propose a new syntaxbased machine translation (MT) approach based on reducing the MT task to a treelabeling task, which is further decomposed into a sequence of simple decisions for which discriminative classifiers can be trained. The approach is very flexible and we believe that it is particularly well-suited for exploiting the linguistic knowledge encoded in deep grammars wh...

متن کامل

MT-ComparEval: Graphical evaluation interface for Machine Translation development

The tool described in this article has been designed to helpMT developers by implementing aweb-based graphical user interface that allows to systematically compare and evaluate various MT engines/experiments using comparative analysis via automatic measures and statistics. The evaluation panel provides graphs, tests for statistical significance and n-gram statistics. We also present a demo serv...

متن کامل

Articles Joshua 6 : A phrase - based and hierarchical statistical machine translation system

We describe the version six release of Joshua, an open-source statistical machine translation toolkit. The main difference from release five is the introduction of a simple, unlexicalized, phrase-based stack decoder. This phrase-based decoder shares a hypergraph format with the syntax-based systems, permitting a tight coupling with the existing codebase of feature functions and hypergraph tools...

متن کامل

Enriching Morphologically Poor Languages for Statistical Machine Translation

We address the problem of translating from morphologically poor to morphologically rich languages by adding per-word linguistic information to the source language. We use the syntax of the source sentence to extract information for noun cases and verb persons and annotate the corresponding words accordingly. In experiments, we show improved performance for translating from English into Greek an...

متن کامل

Correlating decoding events with errors in Statistical Machine Translation

This work investigates situations in the decoding process of Phrase-based SMT that cause particular errors on the output of the translation. A set of translations postedited by professional translators is used to automatically identify errors based on edit distance. Binary classifiers predicting the sentence-level existence of an error are fitted with Logistic Regression, based on features from...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010